Skip to content

SzeKiatTan/nlp-cve-vendor-classification-gpt2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Identifying Vendors via CVE Description Feeds

Finetuning GPT2 for Classification



Common Vulnerabilities and Exposures (CVE) is a list of entries—each containing an identification number, a description, and at least one public reference—for publicly known cybersecurity vulnerabilities. This list is published in the National Vulnerability Database (NVD) and is maintained by NIST.

Currently when new CVEs are discovered and published on the NVD, they typically contain a paragraph of text--the 'description'--that describes the vulnerability, for example for CVE-2018-17189:

In Apache HTTP server versions 2.4.37 and prior, by sending request bodies in a slow loris way to plain resources, the h2 stream for that request unnecessarily occupied a server thread cleaning up that incoming data. This affects only HTTP/2 (mod_http2) connections.

NVD takes 3-5 business days to fill in the 'vendor' column with info--in this case the vendor would be apache.


This exercise to try and see if it is possible to derive the vendor by finetuning the GPT2 model to read the description text. This would allow automated classification of new CVEs without having to wait on NVD to supplement the details.

Main idea: Since GPT2 is a decoder transformer, the last token of the input sequence is used to make predictions about the next token that should follow the input. This means that the last token of the input sequence contains all the information needed in the prediction. With this in mind we can use that information to make a prediction in a classification task instead of generation task.


Previously, a LSTM model was used for this classification task and it had a validation accuracy of 93% on 20 vendors. Using GPT, the goal is to increase the number of vendors (classes) whilst maintain high accuracy.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published